{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1.6 信息论"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 熵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "考虑一个离散变量 $x$。\n",
    "\n",
    "对于该变量的一个观测值，我们需要一个量来衡量从这个观测量中得到了多少关于 $x$ 的信息。\n",
    "从主观意义上来说，一个不常见的观测值所提供的新信息要比常见的观测值要多。\n",
    "\n",
    "我们的信息量 $h$ 依赖于一个概率分布 $p(x)$，$h(x)$ 需要满足这样的性质：\n",
    "- 关于 $p(x)$ 是单调（递减）的，高概率对应的信息量低，低概率对应的信息量高；\n",
    "- 假设两个事件 $x， y$ 是不相关的，即 $p(x, y)=p(x)p(y)$，那么从事件 $(x, y)$ 中获得的信息量应该等于单独从 $x$ 和 $y$ 事件中获得的信息量之和，即 $h(x, y)=h(x) + h(y)$；\n",
    "\n",
    "我们将 $h$ 和 $p$ 之间的关系表示为 $h(p)$，从上面的例子中，我们可以看到\n",
    "\n",
    "$$\n",
    "h(pq) = h(p) h(q)\n",
    "$$\n",
    "\n",
    "归纳法不难给出 $h(p^n) = nh(p)$，所以\n",
    "\n",
    "$$\n",
    "h(p^{n/m}) = nh(p^{1/m}) = \\frac n m h(p^{m/m}) = \\frac n m h(p)\n",
    "$$\n",
    "\n",
    "因此由连续性，我们有 $h(p^x) = x h(p)$。\n",
    "\n",
    "对任意正数 $p,q$，总有 $q^x = p$ 成立，则我们有\n",
    "\n",
    "$$\n",
    "\\frac{h(p)}{\\ln (p)} = \\frac{h(q^x)}{\\ln (p^x)} = \\frac{xh(q)}{x\\ln (q)} = \\frac{h(q)}{\\ln (q)}\n",
    "$$\n",
    "\n",
    "从而 $h(p) \\propto \\ln p$，即 $h$ 与 $p$ 的对数成正比。\n",
    "\n",
    "一个可能的定义为 \n",
    "\n",
    "$$\n",
    "h(x) = - \\log_2 p(x)\n",
    "$$\n",
    "\n",
    "负号保证信息量为正且满足单调递减的要求，在这里，我们选取了 2 作为对数的底，因此 $h(x)$ 的单位为比特（`bit`）。\n",
    "\n",
    "有了对于单一的 $x$ 的信息量，更一般的想法是衡量一组 $x$ 的信息量，因此，我们考虑对 x 求期望，得到熵（`entropy`）：\n",
    "\n",
    "$$\n",
    "H[x]=-\\sum_x p(x) \\log_2 p(x)\n",
    "$$\n",
    "\n",
    "因为 $\\lim_{p\\to 0} plnp = 0$，所以如果 $p(x)=0$，那么 $p(x)\\ln p(x)=0$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 编码长度和熵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在假设我们要传送状态变量给一个接收器。\n",
    "\n",
    "考虑一个有 8 个状态的变量 $x$，每个状态都是等概率的。为了表示这个状态，我们至少需要 3 比特的信息来表示状态，此时熵值为\n",
    "\n",
    "$$\n",
    "H[x] = - 8 \\times \\frac 1 8 \\log_2 \\frac 1 8 = 3~\\text{bits}\n",
    "$$\n",
    "\n",
    "再考虑不等概率的情况，假设状态和概率为\n",
    "\n",
    "|a|b|c|d|e|f|g|h|\n",
    "|---|---|---|---|---|---|---|---|\n",
    "|$\\frac{1}{2}$|$\\frac{1}{4}$|$\\frac{1}{8}$|$\\frac{1}{16}$|$\\frac{1}{64}$|$\\frac{1}{64}$|$\\frac{1}{64}$|$\\frac{1}{64}$|\n",
    "\n",
    "熵为\n",
    "\n",
    "$$\n",
    "H[x] = - \\frac{1}{2} \\log_2 \\frac{1}{2} - \\frac{1}{4}\\log_2 \\frac{1}{4} - \\frac{1}{8} \\log_2 \\frac{1}{8} - \\frac{1}{16} \\log_2 \\frac{1}{16} - \\frac{4}{64} \\log_2 \\frac{1}{64} = 2~\\text{bits}\n",
    "$$\n",
    "\n",
    "不等概率的熵要比等概率的熵要小。\n",
    "\n",
    "考虑字符的霍夫曼编码：\n",
    "\n",
    "|a|b|c|d|e|f|g|h|\n",
    "|---|---|---|---|---|---|---|---|\n",
    "|0|10|110|1110|111100|111101|111110|111111|\n",
    "\n",
    "其平均字符长度为：\n",
    "\n",
    "$$\n",
    "\\frac{1}{2} \\times 1 + \\frac{1}{4} \\times 2 + \\frac{1}{8} \\times 3 + \\frac{1}{16} \\times 4 + \\frac{4}{64} \\times 6 = 2~\\text{bits}\n",
    "$$\n",
    "\n",
    "因此，熵和霍夫曼编码长度之间存在一定的关系。\n",
    "\n",
    "`Shannon` 在 1948 年的 `noiseless coding theorem` 说明，这样的熵是编码长度的下界。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 自然对数熵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在考虑自然对数表示的熵。此时熵的单位是 `nat` 而不是 `bit`。\n",
    "\n",
    "我们之前用平均信息量的方式引入了熵，事实上熵的概念是从物理上引入的。\n",
    "\n",
    "假设我们将 $N$ 个相同的小球放入一些容器中，使得第 $i$ 个容器中有 $n_i$ 个球，$\\sum_i n_i = N$。\n",
    "\n",
    "我们按顺序从第 1 个容器开始往里面放入小球，第 1 个有 $N$ 种选择，第 2 个有 $N-1$ 种选择……总共有 $N!$ 种选择。\n",
    "\n",
    "但是放入每个容器中的 $n_i$ 个球是不区分顺序的，有 $n_i!$ 种可能性放置这些小球，因此总的可能方法有\n",
    "\n",
    "$$\n",
    "W = \\frac{N!}{\\prod_i n_i!}\n",
    "$$\n",
    "\n",
    "定义熵为 \n",
    "\n",
    "$$\n",
    "H = \\frac{1}{N} \\ln W = \\frac 1 N\\ln N! - \\frac 1 N \\sum_{i} \\ln n_i !\n",
    "$$\n",
    "\n",
    "当 $N \\to \\infty$ 时，由 `Stirling` 近似公式：\n",
    "\n",
    "$$\n",
    "\\ln N! \\approx N \\ln N - N\n",
    "$$\n",
    "\n",
    "我们有\n",
    "\n",
    "$$\n",
    "H = \\lim_{N\\to \\infty} \\frac 1 N (N\\ln N - N) - \\frac 1 N \\sum_{i} (n_i \\ln n_i - n_i) \n",
    "= -\\lim_{N\\to \\infty} \\sum_i \\left(\\frac{n_i}{N}\\right) \\ln \\left(\\frac{n_i}{N}\\right)\n",
    "= -\\sum_i p_i \\ln p_i\n",
    "$$\n",
    "\n",
    "$p_i = \\lim_{N\\to\\infty} \\left(\\frac{n_i}{N}\\right)$ 是小球放入第 i 个容器的概率。\n",
    "\n",
    "在物理上，容器中小球的分布叫做微观状态，各个容器中的小球数目叫做宏观状态。\n",
    "\n",
    "对于一个离散随机变量 $X$ 的取值 $x_i$ 当作一个容器，即有 $p(X=x_i)=p_i$，则它的熵为：\n",
    "\n",
    "$$\n",
    "H[p] = - \\sum_i p(x_i) \\ln p(x_i)\n",
    "$$\n",
    "\n",
    "一般来说，一个离散分布的点越多，分布越均匀，那么对应的熵也越大。\n",
    "\n",
    "如果其中有一个 $p_i=1$，那么熵取到最小值 0，最大值可以用 Lagrange 乘子求解，Lagrange 函数为\n",
    "\n",
    "$$\n",
    "\\tilde H = - \\sum_i p(x_i) \\ln p(x_i) + \\lambda \\left(\\sum_i p(x_i) - 1\\right)\n",
    "$$\n",
    "\n",
    "最大化这个函数得到：\n",
    "\n",
    "$$\n",
    "\\frac{\\partial \\tilde H}{\\partial p(x_i)} = -1 -\\ln p(x_i) + \\lambda = 0\n",
    "$$\n",
    "\n",
    "得到最大熵应该满足一个均匀离散分布 $p(x_i)=p(x_j), \\forall i,j$。\n",
    "\n",
    "若状态总数为 $M$，则 $p(x_i) = \\frac 1 M$，此时，我们有 $H = \\ln M$。\n",
    "\n",
    "为了验证的确是个最大值，我们可以计算它的二阶导数来验证。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 连续变量的熵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于连续变量，我们将 $x$ 划分为长度为 $\\Delta$ 的小段，积分中值定理告诉我们，在第 $i$ 小段 $(i\\Delta, (i+1)\\Delta)$ 中存在 $x_i$，使得：\n",
    "\n",
    "$$\n",
    "\\int_{i\\Delta}^{i+1\\Delta} p(x) dx = p(x_i) \\Delta \n",
    "$$\n",
    "\n",
    "我们可以将落在第 $i$ 小段的 $x$ 对应为 $x_i$，则观测到 $x_i$ 的概率为 $p(x_i) \\Delta$，这给出了一个关于 $x_i$ 的离散分布，其熵为：\n",
    "\n",
    "$$\n",
    "H_{\\Delta} = - \\sum_i p(x_i) \\Delta \\ln(p(x_i) \\Delta) = - \\sum_i p(x_i) \\Delta \\ln(p(x_i)) - \\ln\\Delta\n",
    "$$\n",
    "\n",
    "其中 $p(x_i) \\Delta= 1$，现在忽略最后的常数项，然后考虑 $\\Delta \\to0$ 的情况，我们有\n",
    "\n",
    "$$\n",
    "-\\lim_{\\Delta \\to 0} \\left\\{ \\sum_i p(x_i) \\Delta \\ln(p(x_i) \\Delta) \\right\\} = -\\int p(x) \\ln p(x) dx\n",
    "$$\n",
    "\n",
    "这一项叫做微分熵（`differential entropy`），它与离散熵的差别在于一个 $\\ln \\Delta$，在 $\\Delta \\to0$ 时会发散的部分。\n",
    "\n",
    "对于多元向量 $\\mathbf x$，微分熵定义为\n",
    "\n",
    "$$\n",
    "H[\\mathbf x]=-\\int p(\\mathbf x) \\ln p(\\mathbf x) d\\mathbf x\n",
    "$$\n",
    "\n",
    "\n",
    "在离散分布的例子中，我们看到最大化熵的分布是均匀离散分布。\n",
    "\n",
    "为了最大化连续分布的熵，我们需要加上 $p(x)$ 的一阶矩和二阶矩都固定的条件。\n",
    "\n",
    "考虑一维情况，我们的问题为\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\max_{p(x)}~& -\\int p(x) \\ln p(x) dx \\\\\n",
    "s.t.~& 1 = \\int_{-\\infty}^{\\infty} p(x) dx  \\\\\n",
    "& \\mu = \\int_{-\\infty}^{\\infty} xp(x) dx\\\\\n",
    "& \\sigma^2 = \\int_{-\\infty}^{\\infty} (x-\\mu)^2p(x) dx\\\\\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "\n",
    "`Lagrange` 函数为\n",
    "\n",
    "$$\n",
    "-\\int p(x) \\ln p(x) dx + \\lambda_1 \\left(\\int_{-\\infty}^{\\infty} p(x) dx - 1\\right) + \\lambda_2 \\left(\\int_{-\\infty}^{\\infty} xp(x) dx - \\mu\\right) + \\lambda_3 \\left(\\int_{-\\infty}^{\\infty} (x-\\mu)^2p(x) dx - \\sigma^2) \\right)\n",
    "$$\n",
    "\n",
    "令这个泛函对 $p(x)$ 的导数为 `0`，得到\n",
    "\n",
    "$$\n",
    "1-\\ln p(x)+\\lambda_1 + \\lambda_2 x + \\lambda_3 (x-\\mu)^2 = 0\n",
    "$$\n",
    "\n",
    "从而\n",
    "\n",
    "$$\n",
    "p(x) = \\exp\\left\\{1+\\lambda_1 + \\lambda_2 x + \\lambda_3 (x-\\mu)^2\\right\\}\n",
    "$$\n",
    "\n",
    "将这个形式带入约束条件，不难得到\n",
    "\n",
    "$$\n",
    "p(x) = \\frac{1}{(2\\pi\\sigma^2)^{1/2}} \\exp\\left\\{-\\frac{(x-\\mu)^2}{2\\sigma^2}\\right\\}\n",
    "$$\n",
    "\n",
    "即最大熵的分布是高斯分布。\n",
    "\n",
    "注意到，我们并没有对 $p(x)$ 做非负的约束，但是最大化给出的结果是一个非负的结果。\n",
    "\n",
    "高斯分布的熵为：\n",
    "\n",
    "$$\n",
    "\\frac{1}{2} \\left\\{1+\\ln(2\\pi\\sigma^2)\\right\\}\n",
    "$$\n",
    "\n",
    "从这个表达式我们可以看出，与离散熵不同，连续熵可以为负值：$\\sigma^2 < \\frac{1}{2\\pi e}, H(x) < 0$，原因是概率密度函数 p(x) 没有小于 1 的限制，因此 $-p(x)\\ln p(x)$ 不能保证非负。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 条件熵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们有 $\\mathbf{x,y}$ 的联合分布，假设 $\\mathbf x$ 已知，那么 $\\mathbf y$ 的包含的信息量应当由 $-\\ln p(\\mathbf{y|x})$ 来衡量，因此我们可以定义其条件期望为给定 $\\mathbf x$ 下 $\\mathbf y$ 的条件熵（`conditional entropy`）：\n",
    "\n",
    "$$\n",
    "H[\\mathbf{y|x}] = -\\iint p(\\mathbf{x,y}) \\ln(\\mathbf{y|x}) d\\mathbf xd\\mathbf y\n",
    "$$\n",
    "\n",
    "可以计算出，我们有：\n",
    "\n",
    "$$\n",
    "H[\\mathbf{x, y}] = H[\\mathbf{y|x}] + H[\\mathbf{x}]\n",
    "$$\n",
    "\n",
    "因此，用来描述 $\\mathbf{x,y}$ 的信息量等于描述 $\\mathbf x$ 的信息量加上给定 $\\mathbf x$ 的条件下描述 $\\mathbf y$ 所需的信息量。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.6.1 相对熵和互信息"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们有一个未知分布 $p(\\mathbf x)$，然后对它建模得到了一个概率分布 $q(\\mathbf x)$。\n",
    "\n",
    "用 $q(\\mathbf x)$ 的分布来传送 $x$ 所用的信息量（$q(\\mathbf x)$），比用真实的分布 $p(\\mathbf x)$ 传送所用的信息量（$p(\\mathbf x)$）要多一些，多出来的部分为：\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "KL(p\\|q) \n",
    "&= - \\int p(\\mathbf x)\\ln(q(\\mathbf x)) dx - \\left(-\\int p(\\mathbf x)\\ln p(\\mathbf x)\\right)\\\\\n",
    "&= - \\int p(\\mathbf x)\\ln \\left\\{\\frac{q(\\mathbf x)}{p(\\mathbf x)}\\right\\}\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "这叫做相对熵（`relative entropy`）或者 `Kullback-Leibler` 散度（`Kullback-Leibler divergence`）。\n",
    "\n",
    "`K-L` 散度不是对称的，即 $KL(p\\|q) \\neq KL(q\\|p)$。\n",
    "\n",
    "`K-L` 散度满足 $KL(p\\|q) \\geq 0$，当且仅当 $p(\\mathbf x)=q(\\mathbf x)$ 时取等号。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 凸函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果一个函数的弦（`chord`）始终在函数上或者函数上方，那么这个函数就是凸函数。\n",
    "\n",
    "一个凸函数的例子如图所示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAEDCAYAAACWDNcwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHbhJREFUeJzt3Xl8VNX9//HXYZFFFkFJxBWrIAmIbBYQlQBSXBCprbiC\nWxUSNQooWlEJrohgAREirlVRvli1VLRVYgkgiIgCCqioRcUFI6ABEtbk/P44yI8120zumTvzfj4e\neSB3xvBh5ubNnXPP+RxjrUVERIJRxXcBIiKJRKErIhIgha6ISIAUuiIiAVLoiogESKErIhKgaiU9\naIzRfDIRkQqw1pr9HS/1Stdaq68yfA0fPtx7DYn2pddcr3msfpVEwwsiIgFS6IqIBEihGyVpaWm+\nS0g4es2Dp9c8cqak8QdjjC1tfEJERPZkjMFW9EaaiIhEj0JXRCRACl0RkQApdEVEAqTQFREJkEJX\nRCRACl0RkQApdEVEAqTQFREJkEJXRCRACl0RkQApdEVEAqTQFREJkEJXRCRACl0RkQApdEVEAqTQ\nFREJkEJXRCRACl0RkQApdEVEAqTQFREJkEJXRCRACl0RCa/Vq6GoyHcV5aLQFZFw2rIFuneH2bN9\nV1IuCl0RCaf774dWraBbN9+VlIux1h74QWNsSY+LiHixbBl07QpLl8IRR/iuZh/GGKy1Zn+P6UpX\nRMKluBgGDIB77onJwC2NQldEwuXxx8FaF7whpOEFEQmP776D1q3dzbMWLXxXc0AaXhCR8LMWMjLg\nhhtiOnBLU813ASIiZfKPf8BXX8HLL/uuJCIaXhCR2Ld+PbRs6YL31FN9V1OqkoYXFLoiEvuuuAIO\nOQTGjfNdSZmUFLoaXhCR2PbmmzB3Lnz8se9KokKhKyKxa8MGGDgQnnkG6tTxXU1UaHhBRGLXwIGu\noc0TT/iupFw0vCAi4ZOT44YWPvnEdyVRpXm6IhJ7Nm6Ev/wFJk+G+vV9VxNVGl4QkdgzcCBs3w5P\nPeW7kgrR8IKIhEdODrzxhuskFofidnghLS2No48+2ncZADRp0oSrrrrKdxkisS8/H66+Gp58Mu6G\nFX4Tt6EL7hI/FhhjYqYWkZh2881wzjnQs6fvSiqNhhdEJDa8/rrrHrZ0qe9KKlVcX+kGwVrL1q1b\nfZchEm5r17r+uM88A3Xr+q6mUoUydAsKCrj77rtp3rw5tWrVIjk5mZ49ezJv3rx9nrt69Wr69OlD\nvXr1OPTQQ0lPT99vSP773/+mc+fO1KlTh/r163PWWWexcOHCPZ7z9ddfU6VKFe666y6effZZWrRo\nQc2aNZk6dSoAmzZtIiMjg0aNGlG3bl169uzJ559/Xjkvgki8+K0h+aWXQpcuvqupdKEbXti8eTNd\nu3Zl0aJFXHTRRWRmZrJ582bmz5/PnDlz6Ny5867nFhYW0r17d9LS0hg9ejTz58/n8ccfp1GjRtxz\nzz27njdt2jQuvvhiUlJSGDFiBFu3biU7O5suXbrwzjvvcOpeXY2mT5/OunXrdgVsSkoKABdccAE5\nOTn069ePTp06MX/+fHr06MGWLVuCeXFEwuj552HlSpgyxXclwbDWHvDLPRxb7r33XmuMsRMnTizx\neV26dLHGGDtu3Lg9jvfu3dsmJSXt+v327dtt48aN7THHHGPz8/N3Hf/uu+9s3bp1bfv27XcdW7Vq\nlTXG2Fq1atlvv/12j+87Y8YMa4yxt99++x7Hb7vtNmuMsVdddVW5/64ice+bb6xt1MjaxYt9VxJV\nO7Nzv7kauuGFadOm0aRJE9LT00t9btWqVRmw1z5KaWlp/PzzzxQUFACwaNEi1qxZw4ABA6hXr96u\n5x155JFceumlfPjhh6xZs2aP73HOOefsMx3t9ddfB+Dmm2/e4/iQIUPK/pcTSSRFRdC/Pwwe7Lbg\nSRChC90vvviCFmXcqiMpKYkaNWrscaxBgwYArF+/HnDjtMCuIYLd/XZs1apVexw/7rjj9nnu119/\nTb169UhOTt7jeKNGjTjkkEPKVK9IQhk92o3n3nqr70oCFbox3fLMd61ateoBH7MRLG+uVatWuZ4f\nyZ8lEpc++gjGjIEPPoASfk7jUeiudJs2bcqyZcuiFmRNmjQBYMWKFfs89umnnwL7v7Ld3/fZsGHD\nPkMReXl55OfnR16oSLwoLITLLoOxY+HYY31XE7jQhW7fvn355ptvmDx5clS+3ymnnELjxo2ZPHky\nGzdu3HX8hx9+YMqUKbRr147DDz+81O9z3nnnATB27Ng9jo8ZMyYqdYrEjcGDoW1bN0UsAYVueGHI\nkCG89tprpKenM3v2bDp37sy2bduYP38+bdu25a9//euu55blarhq1aqMHTuWiy++mI4dO3LVVVex\nbds2srOzKSoqYlwZ92Q699xz6d69O6NGjeLHH3+kQ4cOLFiwgNzcXA477DANMYgAvPoqzJwJixf7\nrsSb0F3p1qxZk9zcXG677TYWLlzI4MGDGTlyJAUFBaSlpe16Xkn9DvY+fuGFFzJjxgwaNGjA8OHD\nGTlyJKmpqeTm5u4zR7ckr732GgMGDOCNN95g6NCh5OXlkZOTw8EHH6zeCyKrV0N6upuPu9tMoUSj\nfroiUvmKiqB7d+jRA4YN811NpSupn27ornRFJITuv9/NUrj9dt+VeBe6MV0RCZk5c2DSJPjww4Sb\nHrY/utIVkcqzbh1cfjk8/TQccYTvamKCxnRFpHIUF0Pv3pCSAg8/7LuaQGlM14Pi4mIWLlyoqWKS\nuEaPhvXr4YEHfFcSUxS6laSwsJArr7ySXr167dO7QSTuzZvnlvlOnQrVq/uuJqYodCtJnTp1WLJk\nCaeddhqnnHIKDzzwANu2bfNdlkjl+/lnuOQSt336Mcf4ribmaEw3AKtWrSIzM5Mvv/ySSZMm7bGI\nQySuFBW5jSVbt4aHHvJdjTcljekqdANirWX69OlkZmbu2skiKSnJd1ki0TViBMyaBTk5UC1xZ6Tq\nRloMMMbQp08fVqxYQXJyMi1btiQ7O5vi4mLfpYlEx9tvw+TJbhw3gQO3NLrS9eSTTz4hPT2d7du3\nk52dTZs2bXyXJFJx33wDHTrA//1fQmwuWRpd6cagk046iTlz5jBgwADOOussbrrpJjZs2OC7LJHy\n27IF/vQnGDpUgVsGCl2PqlSpwtVXX83y5cspKCggNTWVadOmaW6vhIe1cP31cMIJMGiQ72pCQcML\nMWTevHkMHDiQI444gscee4wTTjjBd0kiJXv8cRg/Ht5/H+rU8V1NzNDwQkh07tyZjz76iB49etCx\nY0dGjBjBli1bfJclsn/z5sFdd8E//6nALQeFboypXr06t9xyC4sXL2bp0qW0atWKmTNn+i5LZE/f\nfw99+8Lf/w5Nm/quJlQ0vBDjZsyYwY033kjHjh155JFHaNy4se+SJNFt2QJpaa6ZzR13+K4mJml4\nIcR69erF8uXLOe6442jVqhUTJkygqKjId1mSqKyF665zy3t3249Qyk5XuiGyYsUKMjIy2LRpE9nZ\n2bRv3953SZJoRo+GF1+EuXPh4IN9VxOz4utKd9063xV4k5qayqxZs8jMzOS8887j+uuv59dff/Vd\nliSI/Jffxo55xN04U+BWWLhC98cfoUULWL7cdyXeGGPo378/y5cvp6ioiNTUVKZMmaK5vRJV+fmu\nhcLo0a5hWNNjt3Jk387kPTFdncMiFL7hheefh+HD3bzARo18V+PdggULSE9Pp2HDhkycOJETTzzR\nd0kSMvn58NFHbguzDz+ERYvc9c3JJ0O7dtC+2QbaPfhnmj94BVX7X+a73FCIvy5jw4bB7NnwzjtQ\no4bvarzbsWMHEyZM4L777iM9PZ077riDWrVq+S5LYlCpAdve/dq8+c49JLdudVunp6XBfff5Lj80\n4i90i4vdHMGaNd2Vr9nv3y3hfP/99wwaNIgPP/yQCRMmcPbZZ/suSTwqd8DuzVq3qeS2ba6RTZVw\njUb6FH+hC1BYCF27wllnuR6esst//vMfbrjhBlq3bs3YsWM56qijfJcklSzigN2fu+927RpnzYKd\nn5yycrPISsuqtL9HvIjP0AX46Sfo1MmN8V5xhe9qYsrmzZsZOXIkjz32GMOGDePGG2+kmnqcxoVK\nCdi9/f3v7mJmwQLYrdm+GWGww2M4E2JE/IYuwKefuvGmF190Y0+yh5UrV5KRkcHatWuZNGkSnTp1\n8l2SlEMgAbu3mTPdsMKsWZCausdDCt2yie/QBXdT7cIL3RYhrVr5ribmWGuZOnUqQ4YMoVevXowc\nOZKGDRv6Lkv24iVg97Z0KfToAf/4B5xxxj4PK3TLJv5DF9xA/y23uM5Hmke4X/n5+dx55528/PLL\nPPTQQ/Tv3x+jm5BexETA7u3bb6FzZ7d1et+++32KQrdsEiN0AR55xG37PHcu6ErugBYtWsTAgQM5\n+OCDmThxIi1atPBdUlyLyYDd27p1cPrp8Je/wODBB3yaQrdsEid0wV3tvveeG5eqXdt3NTGrqKiI\n7OxssrKyuOaaa7jrrrs4WEs7IxaKgN1bYaG7H3L66TBqVIlP1eyFskms0C0uhiuvhF9+gVdfherV\nfVcU09asWcOQIUOYN28e48ePp3fv3r5LCo1QBuzetm+HCy6ABg3g2Wc1FzdKEit0wZ1I55/vlgk/\n84xOpDJ45513yMjIICUlhfHjx3OMxsX3EBcBu7ffLlDWroXp03WBEkWJF7rgPjL94Q/up+Fvf9Oq\ntTLYunUro0aNYty4cQwdOpRBgwZRPQF/EOMyYPdmLdx0EyxeDG+9paG4KEvM0AX49Vc3h/ePf3QL\nKKRMvvrqK2644QZWr17NpEmTOP300/d4PCsri5SUFC666CJPFUZPQgTs/owY4Vo0zpoFhxziu5q4\nk7ihC27V2umnQ3q6toguB2str7zyCoMGDeLMM89k1KhRNNrZ1W3RokW7drQ49NBDPVdadgkbsHsb\nMwYmT4Y5cyA52Xc1cSmxQxdg9Wo30fv222HAAN/VhMrGjRsZPnw4L7zwAg888ABXX301VapUITMz\nk8LCQp588knfJe6XAvYAsrPdDIU5c6ACPTk0e6FsFLoAX33lhhruvx/69/ddTegsWbKE9PR0jDFk\nZ2fTpEkTUlNTeemll/YZfgiaAraMnnvOtUXNzYXjj6/Qt9A83bJR6P7ms8/cfMSHH4ZLL/VdTegU\nFxfz5JNPcuedd9KvXz9at27Ngw8+yJIlSzjooIMCqWH3gF20yP2qgC2DF1+EW291PaibN6/wt1Ho\nlo1Cd3fLl8OZZ8LYsRAHN4KCsmPHjl1dyvLy8hg6dCg5OTkkJydzwQUXMGzYsKj/mSUFbPv2ML6h\nYdmfrQK2NNOmuZkKOTluu6sIKHTLRqG7t48/dtPJHn3UNcqREq1cuZKTTz6ZmjVrkpSURFJSEsnJ\nyezYsYPc3Fw2btzI+++/H9HuxKUFbLt2+17BKgDKYNo0yMx0fXGj0AxKr3nZlBS6idlgtVUrNzex\nZ083QVxXvCVq1qwZhYWF/PLLL+Tl5ZGXl8dPP/1EXl4eLVu2JDc3l59++qnM36+0gD3nHLjrLg0R\nRGzqVDdjJ0qBK9GRmKEL7if87bdd8BYVaYy3FMYYGjZsSMOGDWlejjFBBawnL77o+pDMnAktW0bt\n2w7vovnukUrM4YXdLVvmhhoeeMAtiZQKq8gQQST0UfcAnn7a/Uv29tsRj+FKxWh4oSQtW7pVOWee\n6ZYOZ2T4rigUdAUboyZOhJEj3TndrJnvamQ/FLoAJ57odp/o3t0F7y23+K4opihgQ+Lhh2HSJHcu\nH3ec72rkABS6v/nd71zz8x49XFvI++5LyCY5CtgQshbuuMN1Cps7F4480ndFUgKF7u6OOsotjzz7\nbFi/HiZMiOtkUcDGgaIiuOEG+OADd+4edpjviqQUupG2Pxs2/P9+vM8/DzVq+K4oYkHf5ApCwt9I\n27rV7dq7bp3rGFavXqX/keq9UDZaHFERW7a4E3r9enjtNahf33dFZRaPAbs/CR26GzZAnz5w6KHw\nwguBXRgk9GteDpq9UBE1a7odhjMzoUsXePNNOOII31XtQ0MECej77+Hcc+HUU92qSr2xoaLQLUnV\nqm5c96GHoFMneOONqE40Ly8FrLBsmQvcjAwYOjQhb/aGnUK3NMa4PrzHHAPdusFLL7mpZZVMASv7\nyMmByy5z209pBWVoKXTL6tJL3VScvn3ddLJrr43at1bASqkmT4a773YNbLp08V2NREChWx5dusC7\n77qPd5995jrwlzMFFbBSLkVFbhhhxgw3B7dpU6/lqPdC5DR7oSLWr3ctIWvUcI1FDrCxX6LMIvAl\n7u+k//orXHIJbNsGL78MDRv6rkjKSFPGKsP27TB4sOvi9K9/kZ/cTAEbsLgO3ZUroXdv14xpzBio\nXt13RVIOCt3K9MQT/D6jHSuqteLkttUUsAGK29B9/XW45hq3n18U7x1IcDRPtzJdey3TDlvM0Tce\nT9Uzr4Thw6FKFd9VSRgVF8M998BTT8G//gUdO/quSCqB0iEKmvyxDVUXve/a6Z17Lqxd67skCZu1\na91d1P/+1/VRUODGLYVulGR9lu12Wj3pJDe28N57vkuSsFiwwJ0zJ5/sQvfww31XdEBZuVm+Swg9\nhW6UjJg9wt3sGDXKLc08/3zX37S42HdpEquKi9350rs3jB/vVj5Wi+0RvxGzR/guIfQUupWhd2/3\nEfGf/3QfGcuxaaMkiJ9+cufG9OnuXDn/fN8VSUAUupXl2GNdB//27aFNG9e3QQTcudC6tRtSyM11\n54okjNj+LBN21aq5JcN/+AP07+9+2EaPhtq1fVcmPhQUuNVlb7zhOtidcYbvisQDXekG4YwzYOlS\n2LjRXeHMn++7IgnavHnuRtnGjbBkiQI3gSl0o6TUNen167tdKB56CP70J3fFs3lzMMWJP4WFcOut\n8Oc/uxurzz13wGXjYaDeC5HTijQffv7Z7Wu1eDE88YS6RlVQzK9ImzPHrSxr187NaGnUyHdFEpCS\nVqTpSteHRo3cmN6oUa4/6oABbgdiiQ+//ALXXeea1YweDVOnKnBlF4WuT336uJ0AqlWD1FSYMsVt\npy3hZK17D1NT3ZztFSs0FUz2oeGFWPH++zBwoBv7ffRRt7JNShRTwwuffOKGjDZsgEmTtIw3wWl4\nIQw6dHA9IS+6yG0HlJnpttaW2LZuHdx4o3vPLr7YvYcKXCmBQjdKorImvWpVSE93H0uLiiAlBcaO\ndU2sJbZs3erem5QUN6ywYoV77+K8l6d6L0ROoRslUV2Tfthh8NhjbrXSzJnuB/vFF9XHIRYUF7v3\nIiUF3n7bvUcTJrj3LAGo90LkFLqxLDXVrV566ikYN85NPXr9dd1s88Fa1+O2bVv3Xjz9NLz5pnuP\nRMpBoRsGaWmu/d/w4XDnnfD737swVvhWPmvdppAdOrgdQ7Oy3HuRlua7MgkphW5YGOOmmC1e7Faz\n3XGHa6QzbZob/5XoKipym0G2bQvDhrlVZYsXu/fA7PemtEiZKHTDpkoVtxPxkiWumc7YsXDiiW5c\nsaDAd3XhV1DgxtObNYO//Q3uvde91hdeqG2YJCp0FkVJ4GvSjYFevVzznOeeczsOHHusuyJbtSrY\nWuLBqlVwyy3uNczJcX0y5s93r7GubHdR74XIaXFEPPnf/2DiRHj2WTcGee21bs+2ON2+O+LFEdu3\nu7HxyZNh4UK48krIyIDf/S5qNUpi0hbsiaaw0I31PvmkC+LLL4d+/eJulVuFQ/eTT9yngylTXMBe\nd53rAqY+xxIlWpGWaGrXdldt777rNsusVs1d8bZuDQ8+CF995bvC4P3vf+7vfvLJbpuc6tXdkMy7\n77oG8wpcCYiudBNFcTHMneuugF95xe04e/757qtNm1COW5Z4pWutuwE2fbrbq+7HH93V7EUXwWmn\n6aaYVCoNL8ieiorcTaLp093Xpk3Qs6fbVqhrV2jc2HeFZbJP6P7wA8ya5VbxvfUW1K3rNgnt0wc6\ndYr7JboSOxS6AcjKzSIrLct3GRXz5ZcupGbOdI23k5Lc1eCpp7qwatYs9gKrqAhzXzXs0U+5rXDm\nzYO8PLdooVs3OPtsOP5431XGnVCf5wFS6AYgptoMRqK4GD7+2IXYe++5r7w8NxbaujW0aOGWvp54\nIiQnV/6whLWwZg188QUsX+76Dy9dCkuXYm7ZhP38Eujc2X2ddFLs/eMQZ+LmPK9kCt0AxPXJ+Msv\nbjXWxx+74Fu+3F0dFxa6u/9HHw1HHeWGJZKS3C4J9etDvXpQpw7UqAEHHeQC0VoX7Nu2wZYtbjFC\nfr77WrvWBfyaNbB6NXz7rZs/W7s2NG3qwr5lSxeubdpgxjeM39c8RsX1eR5FJYWutmCX0jVo4D6y\nd+u25/H8fDcr4LvvXEiuWeOmY/38s2vmvWGD2/1261b39ds/4MZAzZoujGvXdgFdv74L66QkaNXK\nzbY45hi3WCHEGzmK7E2hKxVXv76b+dCmje9KREJD82ZERAKk0I0SrUmXRKDzPHK6kSahpZs6Equ0\nDFhEJEYodEVEAqTQFREJkEJXRCRACt0oycrN8l2CSKXTeR45hW6UjJg9wncJIpVO53nkFLoiIgFS\n6IqIBEihKyISIIWuiEiAFLpRojXpkgh0nkdOvRcktNR7QWKVei+IiMQIha6ISIAUuiIiAVLoiogE\nSKEbJVqTLolA53nkFLpRojXpkgh0nkdOoSsiEiCFrohIgBS6IiIBUuiKiARIoRslWpMuiUDneeTU\ne0FCS70XJFap94KISIxQ6IqIBEihKyISIIWuiEiAFLpRojXpkgh0nkdOoRslWpMuiUDneeQUuiIi\nAVLoiogESKErIhIgha6ISIAUulGiNemSCHSeR069FyS01HtBYpV6L4iIxAiFrohIgBS6IiIBUuiK\niARIoRslWpMuiUDneeQUulGiNemSCHSeR06hKyISIIWuiEiAFLoiIgFS6IqIBEihGyVaky6JQOd5\n5NR7QUJLvRckVqn3gohIjFDoiogESKErIhIgha6ISIAUulGiNemSCHSeR06hGyVaky6JQOd55BS6\nIiIBUuiKiARIoSsiEiCFrohIgBS6UaI16ZIIdJ5HTr0XJLTUe0FilXoviIjECIWuiEiAFLoiIgFS\n6IqIBEihGyVaky6JQOd55BS6UaI16ZIIdJ5HTqEbLat8F5CA9JoHT695xBS60fK17wIS0Ne+C0hA\nX/suIPwUuiIiAVLoiogEqNRlwAHWIiISNw60DLjE0BURkejS8IKISIAUuiIiAVLoiogESKErIiUy\nxhxkjOlvjFlkjDnWdz1hV813ASIS8y4AbgJOAn7xXEvoKXQlphljbgJqA22Bu3EBYIAG1tohPmtL\nFNbaqcaYw4B+1toNvusJOw0vSMwyxmQC/7HWPgi8DeQCzwA1gMs9lpaIugH/9V1EPFDoSiyrZq39\nfOd/HwUsttb+ADwOnO6vrMRijDHAGSh0o0LDCxEwxiQB9wA/AAcDvwKH6WNvdFhrH9ntt2cAb+08\n/p2fihJWa6Ae0NwY0xLoBEy21ub4LSucFLoVZIxpCMwGhltrpxlj6gLf4cYdJYqMMTWADsAdB3h8\nANDcWjso0MISRzegAPivtXa5MWYeMN0Yc6S1tthzbaGj4YWKGwX8aK2dtvP3m4AdwLv+Soofxpjq\nxphuO3/bEXfz7IOdjzUyxuwa07XWPg6cpelMlaYr8Ki1dvnO328CkoET/ZUUXgrdCjDGHAr0A17e\n7fBJuBs8i70UFX+uA940xtQCegNrrbU7dj52PTBjr+c/D1wZXHmJwRhTFTgNmLvb4ZSdvxYEX1H4\naXihYjoC1YFZux07A1igj1tRMxt4FTek8CpQYIwZA2wGpllrf93r+c/j3g/tJxNdTYG6wILdjqUB\n31hrv/VSUcgpdCumJlBkrf1st2NnAPOMMQcB11lrJ/gpLT5Ya5cBl+52aF4p/0sSUGSM6WqtnVXK\nc6Xs6gB51tqNAMaYasAfccNrUgEaXqiYBcB2Y0wdAGNMb6A78DnupsMSj7UlHGPMmUBn3E3MqzyX\nE29WADt23swEGI47z3VRUUHqp1tBxph+uI9Z3wIrga3AJcCX1tq/eiwtoRhj+gKnWGtvNcbUBL4E\nUn67MpPIGWPOB87B3UCzwJ3W2i1+qwovha6EljGmAzDAWnv1bsceAlZreEdilUJXRCRAGtMVEQmQ\nQldEJEAKXRGRACl0RUQCpNAVEQmQQldEJEAKXRGRACl0RUQCpNAVEQnQ/wMvDyGhLaSDiAAAAABJ\nRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7f74c9d7ba50>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "xx = np.linspace(-3, 4, 100)\n",
    "yy = xx ** 2\n",
    "\n",
    "ax.plot(xx, yy, 'r')\n",
    "ax.set_xlim(-4, 5)\n",
    "ax.set_yticks([])\n",
    "ax.set_ylim(-10, 20)\n",
    "\n",
    "ax.set_xticks([-2, 0.5, 3])\n",
    "ax.set_xticklabels([r\"$a$\", r\"$x_\\lambda$\", r\"$b$\"], fontsize=\"xx-large\")\n",
    "\n",
    "for i in [-2, 3]:\n",
    "    ax.plot([i, i], [-10, i ** 2], 'g--')\n",
    "\n",
    "ax.plot([-2, 3], [4, 9], 'b')  \n",
    "ax.plot([0.5, 0.5], [-10, 6.5], 'g')\n",
    "ax.annotate(\n",
    "    \"chord\",\n",
    "    fontsize=\"xx-large\",\n",
    "    xytext=(-3, 12),\n",
    "    xy=(-0.2, 6),\n",
    "    arrowprops=dict(arrowstyle=\"->\",\n",
    "                    connectionstyle=\"arc3\")\n",
    "           )\n",
    "\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在 $x=a$ 和 $x=b$ 的区间内的任意点 $x$ 都可以写成 $\\lambda a +(1-\\lambda)b, 0\\leq\\lambda \\leq 1$，对应的弦上的值为 $\\lambda f(a) +(1-\\lambda) f(b)$，函数值为 $f(\\lambda a +(1-\\lambda)b)$。凸性等价于\n",
    "\n",
    "$$\n",
    "f(\\lambda a +(1-\\lambda)b) \\leq \\lambda f(a) +(1-\\lambda) f(b)\n",
    "$$\n",
    "\n",
    "这也等价于函数的二阶导数非负（如果二阶导数存在）。\n",
    "\n",
    "证明如下：\n",
    "\n",
    "- 必要性\n",
    "\n",
    "假设凸性的式子成立，令 $\\lambda = \\frac 1 2, b = a + 2\\epsilon$，则有：\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "\\frac 1 2f(a)+\\frac 1 2f(b) & \\geq f\\left(\\frac 1 2a + \\frac 1 2b\\right) \\\\\n",
    "& = \\frac 1 2 f\\left((\\frac 1 2a + \\frac 1 2b\\right) + \\frac 1 2 f\\left((\\frac 1 2a + \\frac 1 2b\\right) \\\\\n",
    "& = \\frac 1 2 f(a+\\epsilon) + \\frac{1}{2} f(b-\\epsilon)\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "从而\n",
    "\n",
    "$$\n",
    "f(b) - f(b-\\epsilon) \\geq  f(a+\\epsilon) - f(a)\n",
    "$$\n",
    "\n",
    "取 $\\epsilon\\to 0$，我们有 \n",
    "\n",
    "$$\n",
    "f'(b) \\geq f'(a)\n",
    "$$\n",
    "\n",
    "从而\n",
    "\n",
    "$$\n",
    "f''(x) \\geq 0\n",
    "$$\n",
    "\n",
    "- 充分性\n",
    "\n",
    "假设二阶导数非负：$f''(x) \\geq 0$，那么由泰勒展开，存在一个 $x^\\star$ 使得\n",
    "\n",
    "$$\n",
    "f(x) = f(x_0) + f'(x_0)(x-x_0) + \\frac 1 2 f''(x^\\star) (x-x_0)^2 \\geq f(x_0) + f'(x_0)(x-x_0)\n",
    "$$\n",
    "\n",
    "令 $x_0 = \\lambda a +(1-\\lambda b), x = a$，则有\n",
    "\n",
    "$$\n",
    "f(a) \\geq f(x_0) + f'(x_0)((1-\\lambda)(a-b))\n",
    "$$\n",
    "\n",
    "令 $x_0 = \\lambda a +(1-\\lambda b), x = b$，则有\n",
    "\n",
    "$$\n",
    "f(b) \\geq f(x_0) + f'(x_0)(\\lambda(a-b))\n",
    "$$\n",
    "\n",
    "合并两个不等式，去掉 $f'(x_0)$ 的项：\n",
    "\n",
    "$$\n",
    "\\lambda f(a) +(1-\\lambda) \\geq f(b) f(\\lambda a +(1-\\lambda)b)\n",
    "$$\n",
    "\n",
    "凸函数的例子：\n",
    "\n",
    "- $x\\ln x$\n",
    "- $x^2$\n",
    "\n",
    "如果等式只在 $\\lambda = 1$ 和 $\\lambda = 0$ 时成立，则称 $f(x)$ 是严格凸函数。\n",
    "\n",
    "如果 $f(x)$ 是凸函数，那么 $-f(x)$ 是凹函数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Jensen 不等式 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用数学归纳法不难证明，对于任意凸函数 $f(x)$，`Jensen` 不等式成立：\n",
    "\n",
    "$$\n",
    "f\\left(\\sum_{i=1}^M \\lambda _i x_i\\right) \\leq \\sum_{i=1}^M\\lambda_i f(x_i)\n",
    "$$\n",
    "\n",
    "其中 $\\lambda_i \\geq 0, \\sum_i \\lambda_i = 1$。等式成立的条件为 $f$ 是线性，或者所有的 $x_i$ 都相等。\n",
    "\n",
    "证明如下：\n",
    "\n",
    "$M = 1$ 时显然成立；假设对于某个 $M$ 不等式成立，那么对于 $M+1$ 来说：\n",
    "\n",
    "$$\n",
    "f\\left(\\sum_{i=1}^{M+1} \\lambda_i x_i\\right) = f\\left(\\lambda_{M+1} x_{M+1} + \\sum_{i=1}^M \\lambda _i x_i\\right) = f\\left(\\lambda_{M+1} x_{M+1} + (1-\\lambda_{M+1})\\sum_{i=1}^M \\eta_i x_i\\right) \n",
    "$$\n",
    "\n",
    "其中 \n",
    "\n",
    "$$\n",
    "\\eta_i = \\frac{\\lambda_i}{1-\\lambda_{M+1}}\n",
    "$$\n",
    "\n",
    "考虑凸函数性质，我们有\n",
    "\n",
    "$$\n",
    "f\\left(\\sum_{i=1}^{M+1} \\lambda_i x_i\\right) = f\\left(\\lambda_{M+1} x_{M+1} + (1-\\lambda_{M+1})\\sum_{i=1}^M \\eta_i x_i\\right) \\leq \\lambda_{M+1} f(x_{M+1}) + (1-\\lambda_{M+1}) f\\left(\\sum_{i=1}^M \\eta_i x_i\\right)\n",
    "$$\n",
    "\n",
    "再应用假设条件，我们有\n",
    "$$\n",
    "f\\left(\\sum_{i=1}^{M+1} \\lambda_i x_i\\right) \\leq \\lambda_{M+1} f(x_{M+1}) + (1-\\lambda_{M+1}) f\\left(\\sum_{i=1}^M \\eta_i x_i\\right) \\leq \\lambda_{M+1} f(x_{M+1}) + \\sum_{i=1}^M (1-\\lambda_{M+1})\\eta_i f\\left(x_i\\right) = \\sum_{i=1}^{M+1} \\lambda_i f(x_i)\n",
    "$$\n",
    "\n",
    "如果我们认为 $\\lambda_i$ 是某个离散分布取 $x_i$ 的概率，那 `Jenson` 不等式可以表示为\n",
    "\n",
    "$$\n",
    "f\\left(\\mathbb E[\\mathbf x]\\right) \\leq \\mathbb E[f(\\mathbf x)]\n",
    "$$\n",
    "\n",
    "不等式成立的条件为 $x$ 为常数，或者 $f$ 是线性的。\n",
    "\n",
    "更一般的，我们有：\n",
    "\n",
    "$$\n",
    "f\\left(\\mathbb E[y(\\mathbf x)]\\right) \\leq \\mathbb E[f(y(\\mathbf x))]\n",
    "$$\n",
    "\n",
    "对于连续形式，我们有\n",
    "\n",
    "$$\n",
    "f\\left(\\int y(\\mathbf x)p(\\mathbf x)d\\mathbf x\\right) \\leq \\int f(y(\\mathbf x))p(\\mathbf x)d\\mathbf x\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 应用 Jensen 不等式"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "将 `Jensen` 不等式应用到 `K-L` 散度（$-\\ln x$ 是凸函数）上：\n",
    "\n",
    "$$\n",
    "KL(p||q)=-\\int p(\\mathbf x)\\ln\\left\\{\\frac{q(\\mathbf x)}{p(\\mathbf x)}\\right\\} d\\mathbf x\n",
    "\\geq -\\ln \\int \\left\\{p(\\mathbf x) \\frac{q(\\mathbf x)}{p(\\mathbf x)}\\right\\} \n",
    "= -\\ln \\int q(\\mathbf x)d\\mathbf x = 0\n",
    "$$\n",
    "\n",
    "因为 $-\\ln x$ 是严格凸的，因此等式只能在 $\\frac{q(\\mathbf x)}{p(\\mathbf x)}$ 为常数时成立，考虑到两者都是概率分布（积分都为 1），所以只能有 $p(\\mathbf x) = q(\\mathbf x)$ 成立。\n",
    "\n",
    "因此，我们可以将 `K-L` 散度来衡量两个分布的差异。\n",
    "\n",
    "另一方面，我们看到，当我们使用真实的数据分布传送数据，所用信息量的期望是最小的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 最小化相对熵"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们有一组从未知分布 $p(\\mathbf x)$ 中产生的数据，我们的模型为 $q(\\mathbf{x|\\theta})$，一个决定参数 $\\mathbf \\theta$ 的方法就是最小化 $p(\\mathbf x)$ 和 $q(\\mathbf{x|\\theta})$ 的 `K-L` 散度。\n",
    "\n",
    "然而我们并不知道 $p(x)$ 的真实分布，一个近似的方法是使用已观测的数据进行模拟：\n",
    "\n",
    "$$\n",
    "KL(p|q) \\approx \\sum_{n=1}^N (-\\ln q(\\mathbf{x}_n | \\mathbf{\\theta}) + \\ln p(\\mathbf x_n))\n",
    "$$\n",
    "\n",
    "由于右边的部分与 $\\mathbf\\theta$ 无关，因此，最小化 `K-L` 散度就相当于最大化似然函数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 互信息"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "考虑两个变量 $\\mathbf{x,y}$ 的分布，如果它们独立，则 $p(\\mathbf{x,y}) = p(\\mathbf{x})p(\\mathbf{y})$，如果他们不独立，则我们用 $p(\\mathbf{x,y})$ 和 $p(\\mathbf{x})p(\\mathbf{y})$ 的 `K-L` 散度来衡量它们接近独立的程度：\n",
    "\n",
    "$$\n",
    "I[\\mathbf{x,y}] \\equiv KL(p(\\mathbf{x,y})||p(\\mathbf{x})p(\\mathbf{y}))\n",
    "= - \\iint p(\\mathbf{x,y}) \\ln \\frac{p(\\mathbf{x})p(\\mathbf{y})}{p(\\mathbf{x,y})} d\\mathbf xd\\mathbf y\n",
    "$$\n",
    "\n",
    "这叫做 $\\mathbf x$ 和 $\\mathbf y$ 的互信息（`mutual information`）。从 `K-L` 散度的性质中我们可以看出，$I[\\mathbf{x,y}] \\geq 0$，当且仅当 $\\mathbf{x,y}$ 是独立的时候等号成立。\n",
    "\n",
    "另外，我们从定义式中可以得到：\n",
    "\n",
    "$$\n",
    "I[\\mathbf{x,y}] = H[\\mathbf x] - H[\\mathbf{x|y}] = H[\\mathbf y] - H[\\mathbf{y|x}]\n",
    "$$\n",
    "\n",
    "所以我们可以将互信息看成是给定 $\\mathbf y$ 后 $\\mathbf x$ 信息量的减少值。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
